Spatio-temporal layout of human actions for improved bag-of-words action detection

نویسندگان

  • Gertjan J. Burghouts
  • Klamer Schutte
چکیده

5 We investigate how human action recognition can be improved by considering spatio-temporal layout of 6 actions. From literature, we adopt a pipeline consisting of STIP features, a random forest to quantize the 7 features into histograms, and an SVM classifier. Our goal is to detect 48 human actions, ranging from simple 8 actions such as walk to complex actions such as exchange. Our contribution to improve the performance of this 9 pipeline by exploiting a novel spatio-temporal layout of the 48 actions. Here each STIP feature does not in the 10 video contributes to the histogram bins by a unity value, but rather by a weight given by its spatio-temporal 11 probability. We propose 6 configurations of spatio-temporal layout, where the varied parameters are the 12 coordinate system and the modeling of the action and its context. Our model of layout does not change any 13 other parameter of the pipeline, it requires no re-learning of the random forest, yields a limited increase of the 14 size of its resulting representation by only a factor two, and at a minimal additional computational cost of only 15 a handful of operations per feature. Extensive experiments show that the layout is demonstrated to be 16 distinctive of actions that involve trajectories, (dis)appearance, kinematics, and interactions. The visualization 17 of each action’s layout illustrates that our approach is indeed able to model spatio-temporal patterns of each 18 action. Each layout is experimentally shown to be optimal for a specific set of actions. Generally, the context 19 has more effect than the choice of coordinate system. The most impressive improvements are achieved for 20 complex actions involving items. For 43 out of 48 human actions, the performance is better or equal when 21 spatio-temporal layout is included. In addition, we show our method outperforms state-of-the-art for the 22 IXMAS and UT-Interaction datasets. 23 24

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatio-Temporal Proximity Distribution Kernels for Action Recognition

Spatio-temporal local feature based bag of visual words algorithm (BOVW) has shown promising results in complex human action classification. However, one key disadvantage of BOVW is geometrical unconstraint, which makes it impossible to recognize different actions with the same features but different spatial-temporal distribution of these features. In this paper, we exploit the spatio-temporal ...

متن کامل

Spatio-temporal Co-Occurrence Characterizations for Human Action Classification

The human action classification task is a widely researched topic and is still an open problem. Many state-of-thearts approaches involve the usage of bag-of-video-words with spatio-temporal local features to construct characterizations for human actions. In order to improve beyond this standard approach, we investigate the usage of co-occurrences between local features. We propose the usage of ...

متن کامل

Human Action Recognition Based on 3D Edge Oriented Gradient Histogram of Slide Blocks

In this paper, a new feature called 3D edge oriented gradient histogram of slide blocks is proposed for human action recognition, based on the idea that the slide area of human body edge can be seen as a spatio-temporal silhouette surface when human performing a certain action in video. This feature is processed by defining dense 3D spatio-temporal slide blocks on the spatio-temporal silhouette...

متن کامل

Robust and efficient models for action recognition and localization. (Modèles robustes et efficaces pour la reconnaissance d'action et leur localisation)

This thesis addresses the problem of action recognition, i.e ., how to determine the type of action that is happening in a video and its temporal localization. First, we consider the problem of video representation—how to encode videos in a robust way, such that the representation is suitable for a wide variety of action classes, tasks and video types. We present an extensive evaluation study t...

متن کامل

Action Recognition using Temporal Bag-of-Words from Depth Maps

In this paper, we present a methodology for human action recognition from a sequence of depth maps obtained using Microsoft Kinect. Specifically, we use a Temporal Bag-of-Words model as representation scheme to capture the variation of features across the temporal domain. Our methodology builds the Temporal Bag-of-Words model on top of the spatiotemporal features extracted from interest points....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2013